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Abstract 


We consider the following fundamental problems: 

• Constructing fc-independent hash functions with a space-time tradeoff close to Siegel’s 
lower bound. 

• Constructing representations of unbalanced expander graphs having small size and allowing 
fast computation of the neighbor function. 

It is not hard to show that these problems are intimately connected in the sense that a good 
solution to one of them leads to a good solution to the other one. In this paper we exploit this 
connection to present efficient, recursive constructions of /c-independent hash functions (and 
hence expanders with a small representation). While the previously most efficient construction 
(Thorup, FOCS 2013) needed time quasipolynomial in Siegel’s lower bound, our time bound is 
just a logarithmic factor from the lower bound. 

1 Introduction 

‘Not all those who wander are lost. ’ — Bilbo Baggins. 

The problem of designing explicit unbalanced expander graphs with near-optimal parameters 
is of major importance in theoretical computer science. In this paper we consider bipartite graphs 
with edge set E <Z U x V where \U\ ^ |U|. Vertices in U have degree d and expansion is desired 
for subsets S C U with jS"! < A: for some parameter k. Such expanders have numerous applications 
(e.g. hashing [22], routing [1], sparse recovery m, membership |2|), yet coming up with explicit 
constructions that have close to optimal parameters has proved elusive. At the same time it is 
easy to show that choosing E at random will give a graph with essentially optimal parameters. 
This means that we can efficiently and with a low probability of error produce a description of an 
optimal unbalanced expander that takes space proportional to \U\. Storing a complete description 
is excessive for most applications that, provided access to an explicit construction, would use 
space proportional to |V|. On the other hand, explicit constructions can be represented using 
constant space, but the current best explicit constructions have parameters d and |V| that are 
polynomial in the optimal parameters of the probabilistic constructions m- Furthermore, existing 
explicit constructions have primarily aimed at optimizing the parameters of the expander, with the 
evaluation time of the neighbor function being of secondary interest, as long as it can be bounded 
by poly log u. This evaluation time is excessive in applications that, provided access to the neighbor 

*A shorter version of this paper appeares in Proceedings of STOC 2015. The results in this version slightly 
improves those in the proceedings version for small space, see section ITHl 
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function of an optimal expander, would use time proportional to d, where d is typically constant 
or at most logarithmic in |f7|. 

In this paper we focus on optimizing the parameters of the expander while minimizing the space 
usage of the representation and the evaluation time of the neighbor function. We present random¬ 
ized constructions of unbalanced expanders in the standard word RAM model. Our constructions 
have near-optimal parameters, use space close to |R|, and support computing the d neighbors of a 
vertex in time close to d. 

Hash functions and expander graphs There is a close connection between /c-independent 
hash functions and expanders. A /c-independent fnnction with appropriate parameters will, with 
some probability of failnre, represent the neighbor function of a graph that expands on snbsets of 
size k. This is what we refer to as going from independence to expansion, and the fact follows from 
the standard nnion bonnd analysis of probabilistic constrnctions of expanders. Going in the other 
direction, from expansion to independence, was first nsed by Siegel [22] as a techniqne for showing 
the existence of fc-independent hash fnnctions with evalnation time that does not depend on k. We 
follow in Siegel’s footsteps and a long line of work (see e.g. |9] for an overview) that focnses on the 
space-time tradeoff of /c-independent hash functions over a universe of size u = |?7|. 

Ideally, we would like to construct a data structure in the word RAM model that takes as inpnt 
parameters u, k, and t, and retnrns a /c-independent hash function over U. The hash function 
should use space k{u/k)^/^ and have evalnation time 0{t), matching up to constant factors the 
space-time tradeoff of Siegel’s cell probe lower bonnd for /c-independent hashing m- We present 
the first construction that comes close to matching the space-time tradeoff of the cell probe lower 
bonnd. 

Method Onr work is inspired by Siegel’s graph powering approach [22] and by recent advances in 
tabnlation hashing [ 23 ], showing that it is possible to efficiently describe expanders in space mnch 
smaller than u. Onr main insight is that it is possible to make simple, recursive expander con¬ 
strnctions by alternating between strong nnbalanced expanders and highly random hash fnnctions. 
Similarly to previons work, we follow the procednre of letting a /c-independent function represent a 
bipartite graph T that expands on snbsets of size k. We then apply a graph prodnct to T in order to 
increase the size of the nniverse covered by the graph while retaining expander properties. At each 
step of the recnrsion we return to /c-independence by combining the graph prodnct with a table 
of random bits, leaving us with a new /c-independent fnnction that covers a larger universe. By 
combining the technique of alternating between expansion and independence with a new and more 
efficient graph prodnct, we can improve npon existing randomized constrnctions of nnbalanced 
expanders. 

1.1 Our contribution 

Table [1] compares previons upper and lower bonnds on /c-independent hashing with our results, 
as presented in Corollaries 1, 2, and 3. As can be seen, most resnlts present a trade-off between 
time and space controlled by a parameter t. Tight lower and upper bounds have been known only 
in the cell probe model, bnt onr new constrnction nearly matches the cell probe lower bound by 
Siegel [22] . 

The time bonnd for the construction using explicit expanders im uses the degree of the ex¬ 
pander as a conservative lower bound, based on the possibility that the neighbor fnnction in their 
constrnction can be evaluated in constant time in the word RAM model. The time bonnd that 
follows directly from their work is poly log n. While the constant factors in the exponent of the 
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Table 1: Space-time tradeoffs for fc-independent hash functions 


Reference 

Space 

Time 

Polynomials [131 [S] 

k 

Oik) 

Preprocessed polynomials [IS] 

fci+'^(iogu)^+°(b 

(poly log k) (log 

Expanders [11] -I- [22] 


d= 0(log(u)log(/c))i+iA 

Expander powering [22] 


0(l/e)‘ 

Double tabulation [33] 

k^* + 

Oit) 

Recursive tabulation [33] 

poly k + 

0(d°8‘) 

Corollary 1 


0(^2 +t3 log(fc)/log(u)) 

Corollary 2 


0{t log t + t'^ log(/c)/ log(u)) 

Corollary 3* 


Oitlogt) 

Cell probe lower bound [32] 

/c(u/fc)i/‘ 

t < k probes 

Cell probe upper bound [32] 

k{u/k)^^*t 

0{t) probes 


Table notes: Space-time tradeoffs for fc-independent hash functions from a domain of size u, with 
the trade-off controlled by a parameter t. Time bounds in the last two rows are number of cell 
probes, and remaining rows refer to the word RAM model with word size 0(logu). Leading 
constants in the space bounds are omitted. We use t to denote an arbitrary positive integer 
parameter that controls the trade-off, and We use e to denote an arbitrary positive constant. 
^Corollary 3 relies on the assumption k = 


space usage of [22l | 23 ] have likely not been optimized, their techniques do not seem to be able to 
yield space close to the cell probe lower bound. 

As can be seen our construction polynomially improves either space or time compared to each 
of the previously best trade-offs. We also find our construction easier to describe and analyze than 
the results of nuiaiii!, with simplicity comparable to that of Siegel’s influential paper |22j . 

Like all other randomized constructions our data structures comes with an error probability, 
but this error probability is universal in the sense that if the construction works then it provides 
independent hash values on every subset of at most k elements from U. This is in contrast to other 
known constructions [ini[i9] that give independence with high probability on each partieular set 
of at most k elements, but will fail almost surely if independence for a superpolynomial number of 
subsets is needed. 

Applications Efficient constructions of highly random functions is of fundamental interest with 
many applications in computer science. A /c-independent function can, without changing the analy¬ 
sis, replace a fully random function in applications that only rely on fe-subsets of inputs mapping to 
random values. We can therefore view /c-independent functions as space and randomness efficient 
alternatives to fully random functions, capable of providing compact representations of complex 
structures such as expander graphs over very large domains. Apart from the construction of ex¬ 
pander graphs with a small description, as an example application, /c-independent functions with a 
universal error probability can be used to construct “real-time” dictionaries that are able to handle 
extremely long (in expectation) sequences of insertion and deletion operations in constant time per 
operation before failing. 

Let r > 1 be a constant parameter. We use a /c-independent hash function with k = to 
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split a set of n machine words of w bits into 0{n) subsets such that each subset has size at most 
k, with probability at least 1 — . Handling each subset with Thorup’s recent construction of 

dictionaries for sets of size using time 0(t) per operation |21j we get a dynamic dictionary 

in which, with high probability, every operation in a sequence of length i < takes constant 

time. In comparison the hash functions of [8llini[l9] can only guarantee that sequences of length 

1 < poly(n) operations, where n < 2"', succeed with high probability. The splitting hash function 

needs space which might exceed the space usage of an individual dictionary, but this can 

be seen as a shared resource that is used for many dictionaries (in which case we bound the total 
number of operations before failure). 

2 Background and overview 

In the analysis of randomized algorithms we often assume access to a fully random function of 
the form /:[«]—>■ [r] where [n] denotes the set {0,1,... , n — 1}. To represent such a function we 
need a table with u entries of logr bits. This is impractical in applications such as hashing based 
dictionaries where we typically have that u ^ r and the goal is to use space 0(r) to store r elements 
of [u]. Fortunately, the analysis that establishes the performance guarantees of a randomized algo¬ 
rithm can often be modified to work even in the case where the function / has weaker randomness 
properties. 

One such concept of limited randomness is /c-independence, first introduced to computer science 
in the 1970s through the work of Carter and Wegman on universal hashing [1]. A family of functions 
from [u] to [r] is ^-independent if, for every subset of [u] of cardinality at most k, the output of a 
random function from the family evaluated on the subset is independent and uniformly distributed 
in [r]. Trivially, the family of all functions from [u] to [r] is A:-independent, but representing a 
random function from this family uses too much space. It was shown in m that for every finite 
field ¥ the family of functions that consist of all polynomials over F of degree at most k — 1 is k- 
independent. A function from this family can be represented using near-optimal space [6] by storing 
the k coefficients of the polynomial. The mapping dehned by a function / from a /c-independent 
polynomial family over F = {xi,X 2 , ■ ■ ■, Xu} takes the form 


'fixiY 


f-rO 

Xi 

x\ . 

■ 


Oo 


fix2) 

= 


xl . 



Ol 

(1) 

J{Xu)_ 


.u 

xi ■ 

r^k—1 





The /c-independence of the polynomial family follows from properties of the Vandermonde matrix: 
every subset of k rows is linearly independent. The problem with this construction is that the Van¬ 
dermonde matrix is dense, resulting in an evaluation time of Vl{k) if we simply store the coefficients 
of the polynomial. The lower bounds by Siegel [22], and later Larsen ini, as presented in Table 1, 
show that a data structure for evaluating a polynomial of degree k — 1 using time t < k must use 
space at least k{u/k)^^^. The data structure of [T5| presents a step in this direction, but is still far 
from the lower bound for ^-independent functions. 

The quest for ^-independent families of functions with evaluation time t < k can be viewed 
as attempts to construct compact representations of sparse matrices that fill the same role as the 
Vandermonde matrix. We are interested in compact representations that support fast computation 
of the sparse row associated with an element x G [uj. An example of a sparse matrix with these 
properties is the adjacency matrix of a bipartite expander graph with sufficiently strong expan- 
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sion properties. For the purposes of constructing fc-independent hash functions we are primarily 
interested in expanders that are highly unbalanced. 

Expander hashing Prior constructions of fast and highly random hash functions has followed 
Siegel’s approach of combining expander graphs with tables of random words. If F is a /c-unique 
expander graph (see Dehnition [T]) then we can construct a /c-independent function by composing 
it with a simple tabulation function h. This approach would yield optimal /c-independent hash 
functions if we had access to explicit expanders with optimal parameters that could be evaluated 
in time proportional to the left outdegree. Unfortunately, no explicit construction of a fe-unique 
expander with optimal parameters is known. 

Siegel [22] addresses this problem by storing a smaller randomly generated fc-unique expander, 
say, one that covers a universe of size v}f^. By the fc-independent hashing lower bound, if an 
expander with |?7| = has degree d, then in order for it to be fe-unique it must have a right hand 
side of size \V\ > . To give a space efficient construction of a fe-unique expander that 

covers a universe of size u, Siegel repeatedly applies the Cartesian product to the graph. Applying 
the Cartesian product t times to a /c-unique expander results in a graph that remains fc-unique but 
with the left degree and size of the left and right vertex sets raised to the power t. Using space 
to store an expander with degree t, it follows from the lower bound that the expander resulting 
from repeatedly applying the Cartesian product must have 

|U'| > {k{u^/*/k)^/^f = 


Setting d = 1/e, the randomly generated fc-unique expander that forms the basis of the construction 
has degree 0(l/e), leading to the expression in Table 1. Since we need to store |U'| random words 
in a table in order to create a /c-independent hash function, Siegel’s graph powering approach offers 
a space-time tradeoff that is far from the lower bound from our perspective where both u, k, and 
t are parameters to the hash function. 

Thorup [23] shows that, for the right choice of parameters, a simple tabulation hash function is 
likely to form a compact representation of a /c-unique expander. A simple tabulation function takes 
a string x = {xi,X 2 , ■ ■ ■, Xc) of c characters from some input alphabet (n) = {0,1}”, and returns a 
string of d characters from some output alphabet (m) = {0,1}'”. The simple tabulation function 
h : {nY —>■ {mY is evaluated by taking the exclusive-or of c table-lookups 

h{x) = hi{xi) © h 2 {x 2 ) © • • • © hc{xc) 

where hi : (n) —)• {mY is a random function. The advantage of a simple tabulation function 
compared to a fully random function is that we only need to store the random character tables 
/ii, /i 2 ,..., /ic- Thorup is able to show that for d > 6c a simple tabulation function is /c-unique 
with a low probability of failure when k < Setting n = m and composing the /c-unique 

expander resulting from a single application of simple tabulation with another simple tabulation 
function, Thorup first constructs a hash function with space usage independence \ and 

evaluation time 0(c). He then presents a second trade-off with space independence 
and time 0{Y°^Y that comes from applying simple tabulation recursively to the output of a simple 
tabulation function. Similar to Siegel’s upper bound, the space usage of Thorup’s upper bounds 
with respect to k is much larger than the lower bound as can be seen from Table [1] where the 
space-time tradeoff of his results have been parameterized in terms of the independence /cl}] 

' It should be noted that Thorup’s analysis is not tuned to optimize the polynomial dependence on k, and that 
he gives stronger concrete parameters for some realistic parameter settings. 
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Explicit constructions The literature on explicit constructions has mostly focused on optimiz¬ 
ing the parameters of the expander, with the evaluation time of the neighbor function being of 
secondary interest, as long as it is bounded by poly log u. As can be seen from Siegel’s cell probe 
lower and upper bounds, optimal constructions of /c-independent hash functions have evaluation 
time in the range t = 1 to t = logtt. Therefore, an explicit construction, even if we had one 
with optimal parameters, would without further guarantees on the running time not be enough 
to solve our problem of constructing efficient expanders. Here we briefly review the construction 
given by Guruswami et al. m- It is, to our knowledge, currently the best explicit construction 
of unbalanced bipartite expanders in terms of the parameters of the graph. Their construction 
and its analysis is, similarly to the polynomial hash function in equation ([T]), algebraic in nature 
and inspired by techniques from coding theory, in particular Parvaresh-Vardy codes and related 
list-decoding algorithms m- In their construction, a vertex x is identified with its Reed-Solomon 
message polynomial over a finite field F. The fth neighbor of x is found by taking a sequence of 
powers of the message polynomial over an extension held, evaluating each of the resulting polynomi¬ 
als in the ith element of F, and concatenating the output. In contrast, the constructions presented 
in this paper only use the subset of standard word RAM instructions that can be implemented in 
In Table 1 we have assumed that we can evaluate their neighbor function in constant time 
as a conservative lower bound on the performance of their construction in the word RAM model. 
Other highly unbalanced explicit constructions given in [11123] offer a tradeoff where either one of 
d or |R| is quasipolynomial in the lower bound. In comparison, the construction by Guruswami et 
al. is polynomial in both of these parameters. 

3 Our constructions 

In this section we present three randomized constructions of efficient expanders in the word RAM 
model. Each construction offers a different tradeoff between space, time, and the probability of 
failure. We present our constructions as data structures, with the randomness generated by the 
model during an initialization phase. The initialization time of our data structures is always 
bounded by their space usage, and to simplify the exposition we therefore only state the latter. 
Alternatively, our constructions could be viewed directly as randomized algorithms, taking as input 
a list of parameters, a random seed, and a vertex x € [u] and returning the list of neighbors of x. 
The hashing corollaries presented in Table 1 follow directly from our three main theorems using 
Siegel’s expander hashing technique. 

3.1 Model of computation 

The algorithms presented in this paper are analyzed in the standard word RAM model with word 
size w as dehned by Hagerup [l2| , modeling what can be implemented In a standard programming 
language like C US). In order to show how our algorithms benefit from word-level parallelism we 
use w as a parameter in the analysis. To simplify the exposition we impose the natural restriction 
that, for a given choice of parameters to a data structure, the word size is large enough to address 
the space used by the data structure. In other words, our results are stated with w as an unrestricted 
parameter, but are only valid when we actually have random access in constant time. 

The data structures we present require access to a source of randomness in order to initialize the 
character tables of simple tabulation functions. To accomodate this we augment the model with 
an instruction that uses constant time to generate a uniformly random and independent integer in 
[r] where r < 2'^. We note that our constructions use only the subset of arithmetic instructions 
required for evaluating a simple tabulation function, i.e, standard bit manipulation instructions. 
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integer addition, and subtraction. Our results therefore hold in a version of the word RAM model 
that only uses instructions that can be implemented in AC^, known in the literature as the restricted 
model [12] or the Practical RAM [T8] . 

3.2 Notation and definitions 

Let (n) = {0,1}” denote the alphabet of n-bit strings, and let x = (xi,X 2 , • • • ,Xc) G {nY denote 
a string of n-bit characters of length c. We define a concatenation operator || that takes as input 
two characters x G (n) and y G (m), and concatenates them to form x || y G (n + m). The 
concatenation operator can also be applied to strings of equal length where it performs component¬ 
wise concatenation. Given strings x G {nY and y G {mY the concatenation x || y is an element 
of (n -|- mY with the fth component of x || y defined by (x || y)i = x* || y*. We also define a prefix 
operator. Given x G (re) and a positive integer rre, in the case where m < re we use x[rre] G (rre) to 
denote the m-bit prefix of x. In the case where rre > re we pad the prefix such that x[rre] G (rre) 
denotes x[re] || 0™'“” where 0™'“"' is the character consisting of a string of rre — re bits all set to 0. 

We will present word RAM data structures that represent functions of the form P : {nY —>■ {mY- 
The function P defines a d-regular bipartite graph with input set {nY and output set {1, 2,..., d} x 
(rre). For S C {nY we overload P and define r(S') = {(r,r(x)j) | x G S'}, i.e., r(S) is the set of 
outputs of S. We are interested in constructing functions where every subset S of inputs of size at 
most k contains an input that has many unique neighbors, formally: 

Definition 1. Let P : {nY {mY be a function satisfying the following property: 

VS C {nY, |S| < A:, 3x G S : |P({x}) \ P(S \ {x})| > 1. 

Then, for Z > 0 we say that P is k-unique. If further I > d/2 we say that P is k-majority-unique. 

For completeness we define the concept of /c-independence: 

Definition 2. Let A: be a positive integer and let be a family of functions from U to R. We say 
that is a k-independent family of functions if, for every choice oi I < k distinct keys xi,... ,x/ 
and arbitrary values yi,... ,yi, then, for / selected uniformly at random from R we have that 

Pr[/(xi) = yi A /(x 2 ) = y 2 A • • • A /(xfc) = y^j = \R\~^. 

Simple tabulation functions are an important tool in our constructions. Our data structures 
can be made to consist entirely of simple tabulation functions and our evaluation algorithms can 
be viewed as a sequence of adaptive calls to this collection of simple tabulation functions. 

Definition 3. Let (R, ©) denote an abelian group. A simple tabulation function h : {nY R is 
defined by 

C 

h{x) = ^ hi{xi) 
i=l 

where each character table hi : (re) —>■ i? is a A:-independent function. 

In this paper we consider simple tabulation functions with character tables that operate either 
on bit strings under the exclusive-or operation, R = ((m),©), or on sets of non-negative integers 
modulo some integer r, R = ([r], +). 


7 


3.3 From k-uniqueness to k-independence 

In his seminal paper Siegel [22] showed how a /c-unique function can be combined with a table 
of random elements in order to define a /c-independent family of functions. In his paper on the 
expansion properties of tabulation hash functions, Thorup |24l Lemma 2] used a slight variation of 
Siegel’s technique that makes use of the position-sensitive structure of the bipartite graph defined 
by r : {nY {mY- This is the version we state here. 

Lemma 1 (Siegel [52], Thorup [23]). Let T : {nY —>■ {mY he k-unique and let h : {mY R he a 
simple tabulation function. Then hoT defines a family of k-independent functions. We sample a 
function from the family by sampling the character tables of h. 


3.4 From k-independence to k-uniqueness 

A /c-independent function has the same properties as a fully random function when considering k- 
subsets of inputs. Randomized constructions of A:-unique functions only need to consider fc-subsets 
of inputs. We can therefore use the standard analysis of randomized constructions of bipartite 
expanders to show that, for the right choice of parameters, a A:-independent function is likely to 
be fc-unique. For completeness we provide an analysis here. In our exposition it will be convenient 
parameterize the /c-uniqueness or A:-majority-uniqueness of our constructions in terms of a positive 
integer n such that k = 2'^. 

Lemma 2. For every choice of positive integers c, n, n let V : {nfi —>■ {mY he a 2'^-independent 
function. Then, 

- for m > n K 1 and d> Ac we have that V is 2^^-unique with probability at least 1 — 

- for m > n -\- K A and d > 8c we have that T is 2^^-majority-unique with probability at least 

^ _ 2—dn/4 


Proof. We will give the proof for A;-majority-uniqueness. The proof for A:-uniqueness uses the same 
technique. By a standard argument based on the pigeonhole principle, for T to be fc-majority- 
unique it suffices that for all S C {nfi with |5| < k we have that |r(S')| > (3/4)(i|S'|. Given that 
r is A:-independent, we will now bound the probability that there exists a subset S with |5| < /c 
such that |r(5)| < (3/4)(i|S'|. For every pair of sets {S,B) satisfying that S C {nfi with 151 < k 
and B C {1,2,... , d} x (m) with \B\ = (3/4)d|5|, the probability that F(5) C R is given by 
nf=i(|Bjl/2™')l'^l where Bi = {{i,y) G B}. By the inequality of the arithmetic and geometric 
means we have that 


n 

1=1 


1^*1 


|S| 


< 


M 

d2^ 


d|S| 


This allows us to ignore the structure of B, and obtain a union bound that matches that of the 
standard non-compartmentalized probabilistic construction of bipartite expanders. The probability 
that F fails to be A:-majority-unique is upper bounded by 


E 

i=2 


2 cn\ / ^2™ \ /{3/A)di\ 

Y3/A)di) V d2^ ) 


di 


For every choice of positive integers c, n, n, for m > n k A and d > 8c we get a probability of 
failure less than 2“^'’"’. □ 





3.5 A simple k-unique function 

In this section we introduce a simple construction of a /c-unique function of the form F : (n)'^ —>■ {m)^. 
We obtain F as the last in a sequence Fi,F 2 ,... ,Fc of A:-unique functions Fj : (n)® —>■ Each 

Fj for i > 1 is defined in terms of Fj_i. At the bottom of the recursion we tabulate a ^-independent 
function Fi : (n) —> In the general step we apply Fj_i to the length i — 1 prefix of the 

key (xi, X 2 , • • •, Xj_i), concatenate the result vector component-wise with the zth character Xj, and 
apply a simple tabulation function hi : (m + n)'^ ^ The recursion is therefore given by 

Ti = Ko (F,_i II l(‘^)) (2) 

where : (n) (n)^ is the repeated identity function. The following theorem summarizes the 

properties of F in the word RAM model. 

Theorem 1. There exists a randomized data structure that takes as input positive integers c, n, k 
and initializes a function F : (n)®® {n + k + 1)'^^. In the word RAM model with word size w the 
data structure satisfies the following: 

- The space usage is -|- k)/w). 

- The evaluation time o/F is 0{c^ + c^{n + k)/w). 

- The probability that F is 2'^-unique is at least 1 — 2“'^"’. 

Proof. Set m = n K 1 and d = 4c. We initialize F by tabulating a ^-independent function 
Fi : (n) —>■ (m)'^ and simple tabulation functions / 12 , h^,... ,hc. In total we need to store c functions 
that each have 0(c) character tables with 0(2^”+'®^) entries of 0(c(n-|-K)) bits. The space usage is 
therefore 0(2‘^^~^'^c^ {n-\-K)/w). The same bound holds for the time to initialize the data structure. 

The evaluation time of F can be found by considering the recursion Ti = hi o (Fj_i || I^'^^). At 
each of the c steps we perform 0(c) lookups and take the exclusive-or of 0(c) bit strings of length 
0(c(n + k)). The total evaluation time is therefore 0(c^ + c^(n -I k)/w). 

Consider the function Fj = hi o (Fj_i || I^'^^). Conditioned on Fj_i being /c-unique, it is easy 
to see that (Fj_i || is fe-unique, and by Lemma [U we have that Fj is /c-independent. For our 
choice of parameters, according to Lemma [2] the probability that Fj fails to be /c-unique is less than 
2 - 2 cn_ Therefore, F is A:-unique if Fi,F 2 ,...,Fc are /s-unique. This happens with probability at 
least I - C2-2'®®® > 1 - 2“'®®®. □ 

Combining Theorem [T] and Lemma [U we get /^-independent hashing in the word RAM model. 
We state our result in terms of a data structure that represents a family of functions J-. The family 
is defined as in Lemma [U and represented by a particular instance of a function F, constructed 
using Theorem [H together with the parameters of a family of simple tabulation functions. 

Corollary 1. There exists a randomized data structure that takes as input positive integers u, 
r = k, t and selects a family of functions T from [u] to [r]. In the word RAM model with 

word length w the data structure satisfies the following: 

- The space used to represent T, as well as a function f £ iF, is 0(ku^^^t'^ (log u -p t\ogk)/w). 

- The evaluation time of f is 0(t^ + t^ {log u + t\ogk)/w). 

- With probability at least 1 — l/u we have that F is a k-independent family. 
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Proof. We apply Theorem [H setting c = 2t, n = [(log'u)/2t], k = [logfc]. This gives a function 
T : {ny —>• (n + K + 1)^'^ that is fe-unique over [u] with probability at least 1 — 1/u. To sample a 
function from the family we follow the approach of Lemma[T]and compose T with a simple tabulation 
function h : (n + k + 1)^^ —>■ [r]. The space used to store T follows directly from Theorem [T] and 
dominates the space used by h. Similarly, the evaluation time of /i o T is dominated by the time it 
takes to evaluate T. □ 

Remark. For every integer r > 1 we can construct a family that is /c-independent with proba¬ 
bility at least 1 — u~'^ at the cost of increasing the space usage and evaluation time by a factor r. 
The family is defined by 

T 

where each Pi is constructed independently. 

Remark. The recursion in equation ([2]) is well suited for sequential evaluation where the task is to 
evaluate T in an interval of [m], in order to generate a /c-independent sequence of random variables. 
To see this, note that once we have evaluated T on a key x = (xi, X 2 ,..., Xc), a change in the last 
character only changes the last step of the recursion. It follows that we can generate ^-independent 
variables using amortized time 0{t) and space close to Oikvf/^). To our knowledge, this presents 
the best space-time tradeoff for the generation of fc-independent variables in the case where we do 
not have access to multiplication over a suitable finite field as in [7]. 

3.6 A divide and conquer approach 

In this section we introduce a data structure for representing a /c-majority-unique function that 
offers a faster evaluation time at the cost of using more space. As in the simple construction 
from Theorem [1] we use the technique of alternating between expansion and independence, but 
rather than reading a single character at the time, we view the key as composed of two characters 
X = (xi, X 2 ) and recurse on each. In the previous section we increased the size of the domain of our 
fe-unique function by concatenating part of the key, forming the A:-unique function T || If we 
use only a few large characters this approach becomes very costly in terms of the space required 
to store the simple tabulation function h in the composition /i o (T || 1^“^^). To be able to efficiently 
recurse on large characters we show that the function T((xi,X 2 )) = r(xi) || r(x 2 ) is /c-unique when 
T is /c-majority-unique. 

Lemma 3. Let V : {nY —^ {mY he a k-majority-unique function. Then T || T : {nY x {nY {2mY 
is k-unique. 

Proof. To ease notation we define T = T || T. Let x = (xi,X 2 ) denote an element of {nY x (n)^. 
For S C {nY x {n)^ define Si^a = {x G S' | xi = o}. The following holds for every x = (xi,X 2 ) € S. 

|T({x}) \ T{S \ {x})| = |T({x}) \ {T{S \ U T{S,,., \ {x}))| 

= |(T({x}) \ T(S \ n (T({x}) \ \ {x}))| (3) 

> |T({x}) \ T(5 \ + |T({x}) \ \ {x})| - |T({x})|. 

We will show that for every S C {nY x {nY with \S\ < k there exists a key (xi,X 2 ) G S' such that 
|T({x})\T(S'\{x})| > 0. We begin by choosing the first component of x. Let vrj(S') = {xj | x G 5} 
denote the set of jth components of S. By the A:-majority-uniqueness of F, considering the set 7ri(S'), 
we have that 

3x1 G7ri(5) :VxG5i,.i : |T({x}) \ T(5 \ > d/2. 
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Fix xi with this property and consider the choice of X 2 - By the A:-majority-uniqueness of F, con¬ 
sidering the set 7r2{S), we have that 

Vxi G TTiiS) : 3x2 G : |T({x}) \ \ {x})| > d/2. 

We can therefore always find a key (xi,X 2 ) G S such that both |T({x}) \ T{S \ > d/2, 

l^({^}) \ \ { 3 ;})! > d/2 are satisfied. The result follows from equation ([3|) where we use the 

fact that |T({x})| = d. □ 

We will give a recursive construction of a fc-majority-unique function of the form Fj : (n)^* — 
(m)^* . Let hi : /2'mf‘'^ —>■ (m)^* be a simple tabulation function. For i > 0 the recursion takes 
the following form. 

Vi = h,o (r,_i II r,_i). (4) 

At the bottom of the recursion we tabulate a /c-independent function Fq. 

Theorem 2. There exists a randomized data structure that takes as input positive integers A, n, k 
and initializes a function F : (n)^^ —)• {n + k + In the word RAM model with word length w 

the data structure satisfies the following: 

- The space usage is -|-k)/^). 

- The evaluation time o/F is 0{2^{\ + 2^{n + k)/w)). 

- With probability at least 1 — 2“^”+^ we have that F is 2'^-majority-unique. 

Proof. Let m = n -\- k 4. We initialize F by tabulating Fq and the character tables of the simple 
tabulation functions hi,h 2 , ■ ■ ■ ,h\ where hi : (2m)^* (m)^* . In total we have 0(2"'') tables 

with 0(2^^”+'')) entries of 0(2'''(n -|- k)) bits, resulting in a total space usage of -|- 

k)/w). 

Let T(z) denote the evaluation time of Fj. For z = 0 we can evaluate Fq by performing a single 
lookup in 0(1) time. For z > 0 evaluating hi o (Fj-i || Fj-i) takes two evalutions of Fj-i followed by 
evaluating hi on their concatenated output using 0(2*(l-|-2®(n-|-K)/z(;)) operations. The recurrence 
takes the form 

, f2T(z-l) + 0(2*(l + 2*(rz + K)/u;)) if z > 0 
Til) < { 

^^-[0(1) ifz = 0 

The solution to the recurrence is 0(2*(z -|- 2*(rz -|- k)/zz;)). 

We now turn our attention to the probability that Fj = Lj o (Fj-i || Fj-i) fails to be A:-majority- 
unique. Conditional on Fj_i being A:-majority-unique, by Lemma [3] we have that (Fj-i || Fj_i) 
is fe-unique and composing it with hi gives us a ^-independent function. For our choice of pa¬ 
rameters, according to Lemma [2] the probability that Fj fails to be A:-majority-unique is less than 
2 - 2 ® n Therefore, F is /c-majority-unique if Fq, Fi,..., Fa are A;-majority-unique. This happens 
with probability at least 1 — > 1 — 2“^"+^. □ 

Remark. The recursion in equation (j4]) is well suited for parallelization. If we have c processors 
working in lock-step with some small shared memory we can evaluate F with domain {nfi in 
time 0(c). 

Corollary 2. There exists a randomized data structure that takes as input positive integers u, 
r = k, t and selects a family of functions T from [u] to [r]. In the word RAM model with 

word length w the data structure satisfies the following: 
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- The space used to represent T, as well as a function f ^ F, is 0{ku^/^t{\ogu + tlogk)/w). 

- The evaluation time of f is 0{tlogt + t{logu + tlogk)/w). 

- With probability at least 1 — we have that F is a k-independent family. 

Proof. Apply Theorem [2] with parameters A = [logt] + 1, n = [(log«)/2t] + 1, and k = [logA;]. 
This gives is a function T that is /c-unique over [u] with probability at least 1 — The family F 

is defined by the composition of T with a suitable simple tabulation function following the approach 
of Lemma [TJ □ 

3.7 Balancing time and space 

Theorem [T] yielded a fc-unique function over {n)^ with an evaluation time of about 0{(P‘) while using 
linear space in k. Theorem [2] resulted in an evaluation time of about O(clogc), using quadratic 
space in k. Under a mild restriction on k, the two techniques can be combined to obtain an 
evaluation time of 0(clog c) and linear space in k. We take the construction from Theorem [2] as 
our starting point, but instead of tabulating the character tables oi hi,... ,h\ we replace them with 
more space efficient /^-independent functions that we construct using Theorem [TJ 

Theorem 3. There exists a randomized data structure that takes as input positive integers X, n, 
K = 0{n) and initializes a function T : (n)^^ —)• {n + k + . In the word RAM model with 

word size w the data structure satisfies the following: 

- The space usage is 0(2'^~^'^~^'^^n/w). 

- The evaluation time ofT is 0{2^{X + 2^n/w)). 

- With probability at least 1 — 2“"'+^ we have that T is 2'^ -majority-unique. 

Proof. At the top level, the recursion underlying T takes the same form as in Theorem [2j 

Ti = hio (Tj.i II Tj.i). 

The functions hi : (2m)^*''"^ —>■ are simple tabulation functions with m = n + k + 4. Each 

hi is constructed from 2*"*'^ character tables hij : (2m) —>■ (m)^* . Theorem [2] only assumes that 

the character tables hij are /c-independent functions. We will apply Theorem [1] to construct a 
function T that we for each character table hij compose with a simple tabulation function gij 
in order to construct h^j. By the restriction that k = 0{n) we have that m = 0(n). We set 
the parameters of T to c = 0(1), n = [n/2], k = k such that (2m) can be embedded in (n)'^. 
Furthermore, T uses 0{2'^~^'^n/w) words of space, can be evaluated in 0(1) operations, and is 
A:-unique with probability at last 1 — 2~'^~^. Because T has 0(1) output characters, the time to 
evaluate hij = gij o T is no more than a constant times the word length of the output of hij. The 
time to evaluate T therefore only increases by a constant factor compared to the evaluation time 
in Theorem [2j 

The probability of failure of T to be /c-majority-unique is the same as in Theorem[2l provided that 
T does not fail to be /c-unique. This gives a total probability of failure of less than -|-2~”'“^ < 

2-n+l_ 

We only store a single T and the character tables of gij that we use to simulate the character 
tables hij. From the parameters of T we have that gij uses 0(1) character tables with 0(2”’'’''') 
entries of 0{2^nlw) words. The space usage is dominated by the 0(2"^) character tables of hx that 
use space 0(2"'+''+^'^n/rc) in total. □ 
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Corollary 3. There exists a randomized data structure that takes as input positive integers u, 
r = t, k = and selects a family of functions T from [u] to [r]. In the word RAM 

model with word length w the data structure satisfies the following: 

~ The space used to represent T, as well as a function f ^ F, is 0{ku^/*^t{\ogu)/w). 

- The evaluation time of f is Oftlogt + t{logu)/w). 

- With probability at least 1 — we have that F is a k-independent family. 

Proof. Apply Theorem [3] with parameters A = [logt], n = [(logM)/t] + 1, and k = [logA;]. This 
gives is a function T that is fe-unique over [rt] with probability at least 1 — . The family F is 

defined by the composition of T with a suitable simple tabulation function following the approach 
of Lemma [TJ □ 

3.8 An improvement for space close to k 

In this section we present a different space efficient version of the divide-and-conquer recursion. The 
new recursion is based on an extension of the ideas behind the graph product from Lemma [3l In 
Lemma [3] we use expansion properties over subsets of size k and concatenate the output characters 
of T, resulting in an output domain of size at least k'^. By using stronger expansion properties and 
modifying our graph concatenation product to fit the structure of the key set, we are able to reduce 
the space usage at the cost of using more time. We now introduce a property that follows from 
stronger edge expansion. 

Definition 4. Let T : (n)'^ —)• (m)'^ be a function satisfying the following property: 

ys c {ny, |5| <k,3AC S, |A| > \S\/2 : Vx € A : |r({x}) \ r{S \ {x})| > d/2. 

Then we say that T is k-super-majority-unique. 

The following lemma shows how we can construct a A;-unique function over U'^ from a set of 
fe-super-majority-unique functions over U. For a bit string x we will use the notation x[m\ to denote 
the m-bit prefix of xO™, i.e., a zero-padded m-bit prefix of x. 

Lemma 4. Let q be a positive integer. For j = 1,2,... ,q let Tj : {ny —>■ {m^y he mm.{2k^^^, k)- 
super-majority-unique and set m = maxj(mj + mg_j+i). Then the function T : {^y x (k)'^ —>■ 
defined by 

r{xi,X 2 )(j-i)q+i = (rj(xi)/ \\Tq_j+i{x 2 )i)[m] for {j,l) € {!,...,g} X 4 (5) 

is k-unique. 

Proof. Consider a set of keys S C (n)'^ x (n)'^ with IS"! < k. We will show that there exists an index 
j € {1,... ,q} and a key x = (xi,X 2 ) € S such that x has a unique neighbor with respect to S and 
Tj II Tq-j+i. Consider the set of first components of the set of keys '7ri(S'). For some j € {1,..., g} 
we must have that < |7ri(S')| < 4/*^. By the super-majority-uniqueness properties of Fj 

there must exist more than first components xi € such that rj(xi) has more than 

d/2 unique neighbors with respect to 7ri(S'). Furthermore, because |5| < k, there exists at least 
one such xi that is a component of at most min(2A:^'^“-^'’'^^/'^, k) keys. Following a similar argument 
to the proof of Lemma O by the majority-uniqueness properties of Fq_j+i there exists X 2 € Si.xi 
such that we get a unique neighbor. □ 
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In the following lemma we use a single /c-independent function to represent a set of fc-super- 
majority-unique functions such that the concatenated product of these functions is /c-unique. The 
proof of the lemma is omitted since it follows from using the approach of Lemma [2] to obtain 
expansion |r(S')| > {7/8)d\S\, and then applying Lemma[4]to obtain the /c-uniqueness property. 

Lemma 5. For every choiee of positive integers c, q, k, let f : {kY —>■ (2k + 12)^®'^'^ be a 2'^- 
independent function. For j = 1,... ,q define Lj : {kY —^ ([((j + + 12)^®'^'? by 

^jix)i = fix)i[\{{j + l)/g)K] + 12] for / G {1,... ,16cg}. (6) 

Let m = [(1 + 3/q)K\ + 26. Then the function T : {kY x {kY defined by 

T{xi,X 2 ){j-i)q+i = {Tj{xi)i \\Tq_j+i{x 2 )i)[m] for (j,/) G X {1,..., 16cg} (7) 

is k-unique with probability at least 1 — . 

We remind the reader that the notation x[m] is used to denote the zero-padded m-bit prefix of x. 
Taking the prefix of the concatenated output characters of Tj and Tg-j+i is done with the sole 
purpose of padding the output characters of T to uniform length. 

We now define a randomized recursive construction of a fe-unique function similar to the one 
in Theorem [2j The parameters of the data structure are A, k, and q. The parameters A and k 
determine the size of the universe and the desired A:-uniqueness. The parameter q controls the 
space-time tradeoff of the character tables used in the recursion. At the outer level of the recursion, 
for i = 1,..., A, we repeatedly square the size of the domain, constructing A:-unique functions of the 
form Tj : (k)^* —>■ (2k -|- 26)^^^'^*. At level i of the recursion, we obtain a /c-independent function 
by composing Tj with a simple tabulation function /ij+i : (2 k- k 26)^ (2 kT he 
output of this function is then used to construct Tj+i, following the approach of Lemma [5] with the 
parameter q set to 3. For i = 1, 2,... , A the recursion is described by the following set of equations 

^iixi,X2)(j-l)4S-2i+l = (Xi,jixi)l II Ti^4_j{x2)l)[2K + 26] 

Fi,jixs)i = hi{Ti_i{xs))i[\{ij + 1)/3)k] -k 12] (8) 

ro(x,); = l(^®)(x,)/[2K + 26] 

where the indices are j G {1, 2,3}, / G {1,... , 48 • 2*}, and s G {1,2}. We have defined Tq by simply 
repeating the input 48 times, padded to length 2 k -|- 26, to ensure that it fits into the recursion. In 
practice we only require hi o Tq be be /c-independent over domain (k). 

To further reduce the space usage we apply the technique from Lemma [5] to implement the 
character tables of h*. Each character table has domain (2 k-|- 26). We view this domain as 
consisting of two characters of length k' = k -|- 13. We apply Lemma [5] with parameters c = 1, 
q, and k = k' to construct a function T : (k')^ —> ([(1 -|- 3/g)K'] -|- 26)^®'^ that is h-unique with 
probability at least 1 — 2“^'^ . To facilitate fast evaluation we tabulate the /c-independent function 
f : (k') —)> ([(1-1- l/q)K'~\ -|- 12)^®'? used to construct T. The jih. character table of hi is constructed 
by composing T with an appropriate simple tabulation function, 

hij = Togij, (9) 

where gij : ([(1 -|- l/g)K'] -|- 12)^®'? —>■ (2 k -|- 12)^®'^* is tabulated. 

Theorem 4. There exists a randomized data structure that takes as input positive integers k, q, 
and initializes a function T : (k)^ (2k -|- 26)^®'^ . In the word RAM model with word length w 

the data structure satisifes the following: 
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- The space usage is 

- The evaluation time ofT is 0{2^q^{\ + 2^K/w)). 

- With probability at least 1 — we have that F is 2'^-unique. 

Proof. The total space usage is dominated by the simple tabulation functions used to implement 
the character tables of h\. There are 0(2^) simple tabulation functions pij. Each of these has 
0{q^) character tables with a domain of size that map to bit strings of length 0{2^k). 

This gives a total space usage of k/w). 

Let T{i) denote the evaluation time of Tj. For i = 1 we can evaluate Fi by performing a constant 
number of lookups into ho and combine prefixes of the output in 0(1) time. For i > 1 evaluating 
Fj takes two evaluations of Fj_i and an additional amount of work combining prefixes that is only 
a constant factor greater than the time required to read the output of hi o Fj_i. Evaluating hi 
is performed by 0(2*) evaluations of character tables of the form pij o T. The degree of T is 
0{(f‘) and it has an evaluation time that is proportional to the degree. We therefore perform 0{q^) 
lookups into the character tables of gij where we read bit strings of length 0(2 *k). The recurrence 
describing the evaluation time of Fj takes the form 


T{i)< 


2T{i - 1) + 0(2*g2(i + 2W/w)) 

0 ( 1 ) 


if i > 1 
if i = 1. 


The solution to the recurrence is 0(2*g^(i + 2*(n + k)/w)). 

The construction fails if T fails to be fe-unique or if Fi, ..., F^ fails to be A:-unique. According 
to Lemma [5] this happens with probability less than 2~‘^'^' + □ 

Corollary 4. There exists a randomized data structure that takes as input positive integers u, 
r = t, k and seleets a family of funetions T from [u] to [r]. In the word RAM model with 

word length w the data structure satisfies the following: 

- The space used to represent T, as well as a function f £ iF, is 0{kv}^^t^\og{k)/w). 

- The evaluation time of f is 0(t^(log(A:)/log w)(log(log(ri)/log/c) + log(tt)/tc)). 

- With probability at least 1 — k~'^ we have that F is a k-independent family. 

Proof. Assume without loss of generality that k < u and apply Theorem [J] with parameters A = 
|"log(log(w)/log/c)] +1, K= [log A:] + 1, and q = [3tlog(A:)/logrt]. This gives is a function F that is 
A;-unique over [tt] with probability at least 1 — k~‘^. We compose F with a suitable simple tabulation 
function h that maps to elements of [r]. Implementing h using T we get the same bounds on the 
space usage, evaluation time, and probability of failure as for the data structure used to represent 
F. □ 


Remark. The construction in Corollary [J] presents an improvement in the case where we wish to min¬ 
imize the space usage. For w = 0(logri) and t = [logu] we get a space usage of 0(A: log(ri) log(A:)) 
and an evaluation time of 0(log(ri) log(A;) log(log(tt)/log(A:))). In comparison, for these parameters 
Corollary [1] gives a space usage of 0{k\o^ u) and an evalution time of 0(log^(n) log(A:)). 
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4 Conclusion 


We have presented new constructions of fc-independent hash functions that come close to Siegel’s 
lower bound on the space-time tradeoff for such functions. An interesting open problem is whether 
the gap to the lower bound can be closed. From the perspective of efficient expanders it would be 
very interesting to achieve space o{k) while preserving computational efficiency. Of course, such a 
result is not possible via fc-independence. 
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